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The asymptotic theory of various estimators based on Gaussian 
likelihood has been developed for the unit root and near unit root 
cases of a first-order moving average model. Previous studies of the 
MA(1) unit root problem rely on the special autocovariance structure 
of the MA(1) process, in which case, the eigenvalues and eigenvec- 
tors of the covariance matrix of the data vector have known analytical 
forms. In this paper, we take a different approach to first consider the 
joint likelihood by including an augmented initial value as a parame- 
ter and then recover the exact likelihood by integrating out the initial 
value. This approach by-passes the difficulty of computing an explicit 
decomposition of the covariance matrix and can be used to study unit 
root behavior in moving averages beyond first order. The asymptotics 
of the generalized likelihood ratio (GLR) statistic for testing unit 
roots are also studied. The GLR test has operating characteristics 
that are competitive with the locally best invariant unbiased (LBIU) 
test of Tanaka for some local alternatives and dominates for all other 
alternatives. 

1. Introduction. In this paper we consider inference for moving average 
models that possess one or more unit roots in the moving average polyno- 
mial. To introduce the problem, let's first consider the MA(1) model given 
by 
(1.1) Xt = Zt-eoZt^i, 

where Oq £ E, {Zt} is a sequence of independent and identically distributed 



(i.i.d.) random variables with EZj 



0, EZ| 



ctq and density function fz- 



The MA(1) model is invertible if and only if |^o| < li since in this case Zt 
can be represented explicitly in terms of past values of Xt, that is, 



Z, 



E^o^' 



t-j- 



j=0 
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2 R. A. DAVIS AND L. SONG 

Under this invertibility constraint, standard estimation procedures that pro- 
duce asymptotically normal estimates are readily available. For example, 
if 6 represents the maximum likelihood estimator, found by maximizing the 
Gaussian likelihood based on the data Xi, . . . , Xn, then it is well known (see 
Brockwell and Davis [6]) that 

(1.2) ^0-Oo)AN{o,i-el). 

From the form of the limiting variance in (1.2), the asymptotic behavior 
of 6, let alone the scaling, is not immediately clear in the unit root case 
corresponding to ^o = 1 • 

In the case fz is Gaussian, the parameters ^o and u^ are not identifiab- 
le without the constraint \9o\ < 1. In particular, the profile Gaussian log- 
likelihood, obtained by concentrating out the variance parameter, satisfies 

(1.3) Ln{e) = Ln{i/e). 

It follows that 9 = 1 is a critical value of the profile likelihood, and hence 
there is a positive probability that ^ = 1 is indeed the maximum likelihood 
estimator. If ^o = 1) then it turns out that this probability does not vanish 
asymptotically (see, e.g., Anderson and Takemura [1], Tanaka [21] and Davis 
and Dunsmuir [10]). This phenomenon is referred to as the pile-up effect. 
For the case that ^o = 1 or is near one in the sense that 6q = 1 + 'j/n, it was 
shown in Davis and Dunsmuir [10] that 

where ^^ is a random variable with a discrete component at 0, corresponding 
to the asymptotic pile-up effect, and a continuous component. Most of the 
early work on this problem was based on explicit knowledge of the eigen- 
vectors and eigenvalues of the covariance matrix for observations from an 
MA(1) process; see Anderson and Takemura [1]. Recently, Breidt et al. [4] 
and Davis and Song [13] looked at model (1.1) under the Laplace likelihood 
and the Gaussian likelihood without resorting to knowledge of the precise 
form of eigenvectors and eigenvalues of the covariance matrix. Instead they 
introduced an auxiliary variable, which acts like an initial value and can be 
integrated out to form the likelihood. 

With a couple exceptions, most of previous work dealt exclusively with the 
zero-mean case. Sargan and Bhargava [17] and Shephard [18] showed that 
for the nonzero mean case, the so-called pile-up effect is more severe than the 
zero mean case. Chen, Davis and Song [8] extended the results from Davis 
and Dunsmuir [10] to regression models with errors from a noninvertible 
MA(1) process. It is shown that, with a mean term present in the model, 
the pile-up probability goes up to more than 0.95. 

The MA unit root problem can arise in many modeling contexts, espe- 
cially if a time series exhibits trend and seasonality. For example, in personal 
communication, Richard Smith has mentioned the presence of a unit in mod- 
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eling some environmental time series related to climate change [19]. After 
detrending and fitting an ARMA model to the time series, Smith noticed 
that the MA component appeared to have a unit root. One explanation for 
this phenomenon is that detrending often involves the application of a high- 
pass filter to the time series. In particular, the filter diminishes or obliterates 
any power in the time series at low frequencies (including the frequency). 
Consequently, the detrended data will have a spectrum with power at fre- 
quency 0, which can only be fitted with ARMA process that has a unit root 
in the MA component. While we only consider unit roots in higher order 
moving averages in this paper, we believe the techniques developed here will 
be applicable in a more general framework of an ARMA model. This will be 
the subject of future investigation. 

In this paper, we will use the stochastic approaches described in [4] 
and [13] to first study the case when there is a regression component in 
the time series and errors are generated from noninvertible MA(1). A vital 
issue in extending these results to higher order MA models is the scaling 
required for the auxiliary variable. The scaling used for the regression prob- 
lem in the MA(1) case provides insight into the way in which the auxiliary 
variable should be scaled in the higher order case. Quite surprisingly, when 
there is only one unit root in the MA(2) process, that is, 

(1.4) Xt = Zt + ciZt^i+C2Zt^2, 

where — ci — C2 = 1 and {Zt} ~ i.i.d. (0,cr^), the asymptotic distribution of 
the maximum likelihood estimator (ci , C2)' is exactly the same as in invertible 
MA(2) case; see [6]. That is, 

1-cl Cl(l-C2) 



(1.5) v^(g:;;;)AN(o, 



Cl(l-C2) 1 



One difference, however, is that ci and C2 are now totally dependent asymp- 
totically [Ci(l - C2) = (1 - 02^]. 

As seen from (1.3), the first derivative of the profile likelihood function 
is always when 9 = 1. Therefore, the development of typical score tests or 
Wald tests is intractable in this case. Davis, Chen and Dunsmuir [9] used 

the asymptotic result from [10] to develop a test of Hq:6 = 1 based on ^mle 
and the generalized likelihood ratio. Interestingly, we will see that the es- 
timator of the unit root in the MA(2) case has the same limit distribution 
as the corresponding estimator in the MA(1) case. Thus, we can extend the 
methods used in the MA(1) case to test for unit roots in the MA(2) case. 

The paper is organized as follows. In Section 2, we demonstrate our 
method of proof applied to the MA(1) model with regression. This case 
plays a key role in the extension to higher order MAs. Section 3 contains 
the results for the unit root problem in the MA(2) case. In Section 4, we 
compare likelihood based tests with Tanaka's locally best invariant and un- 
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biased (LBIU) test [20] for testing the presence of a unit root. It is shown 
that the hkehhood ratio test performs quite well in comparison to the LBIU 
test. In Section 5, numerical simulation results are presented to illustrate 
the theory of Section 3. In Section 6, there is a brief discussion that con- 
nects the auxiliary variables in higher order MAs with terms in a regression 
model with MA(1) errors. Finally, in Section 7, the procedure for handling 
the MA(g) case with g > 3 is outlined. It is shown that the tools used in 
the MA(1) and MA(2) cases are still applicable and are, in fact, sufficient 
in dealing with higher order cases. 

2. MA(1) with nonzero mean. In this section, we will extend the meth- 
ods of Breidt et al. [4] and Davis and Song [13] to a regression model with 
MA(1) errors. These results turn out to have connections with the asymp- 
totics in the higher order unit root cases (see Section 6). First, consider the 
model 

p 

(2.1) Xt = Y,hofk{t/n) + Zt-eoZt-i, 

fc=o 

where {Zt} is defined as in (1.1), ^o = Ij bko,k = 0, . . . ,p, are regression 
coefficients and fkit/n), k = 0, . . . ,p, are covariates at time t. Notice that the 
covariates fkit/n) are also assumed to be functions on [0,1]. Note that the 
detrended series Yt = Xt — ^^^^Qbh/kit/n) has exactly the same likelihood 
as the one for the zero-mean case. As shown in [13], by concentrating out the 
scale parameter a, maximizing the joint Gaussian likelihood is equivalent to 
minimizing the following objective function: 

n 

(2.2) Ub,9,Zinit) = Y,Zt for 1^1 <1, 

f=0 



where b = {bo, ..., bp)' , Zinit = Zq, and Zi is given by 
Zi = Yi + 9Yi_i + ■■■ + 9'-^Yi + 9'zinit 

= lxi-Y,bkfk{i/n)]+9(xi_i-Y,bkfk{(i-l)/n)]+-- 

\ fc=0 / V k=0 J 

+ e'-^(xi-Y,bkfk{l/n)] +e^Zinit 
\ k=0 J 

(P ^ \ 
Zi- Zi_i +^bkofk{i/n) -^bkfki'i/n) | H 
fc=o fe=o / 

/ p ^ \ 

+ 9'-^ Zi - Zo + ^bkofk{l/n) -^bkfkil/n) +0^Zinit 



k=0 fc=0 
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i=o 

k=0 \j=l J 

k=0 \j=l ) 

■.= Zi -Wi. 

As in [13], we adopt the parametrization for Q and zi-^it given by 
u = 1 + — and Zinit = Zo + 



n yn 

Further set 



(2.3) bk = bko + 



fi' 



3/2 



Note that (2.3) essentially characterizes the convergence rate of the esti- 
mated bk to its true value b^o- At first glance, this parameterization may look 
odd since it depends on the known parameter values, which are unavailable. 
This form of reparameterization is used only for deriving the asymptotic the- 
ory of the maximum likelihood estimators and not for estimation purposes. 
One notes that /3 = n{9 — 1), ijk = rc'''^{bk — bko), so that the asymptotics 
of the MLE 6 and bk of the associated parameters are found by the limit- 
ing behavior of /3 = n{9 — 1), % = n^''^{bk — bko)- Hence, it is not necessary 
to know the true values in this analysis. The scaling n^'^ for the regres- 
sion coefficients is an artifact of the assumption that the regressors take the 
form fkit/n) that is imposed on the problem. This also results in a clean 
expression for the limit. 

Under the {fj, /3, a) parameterization, it is easily seen [13], minimizing Inib, 
d,Zinit) with respect to b,9,Zinit is equivalent to minimizing the function 

(2.4) Un{f],(3,a) = ^[lnib,9,Zi^it)-ln{bo,l,Zo)] 

with respect to ff, (3 and a. Then using the weak convergence results in Davis 
and Song [13], 

^ »i n „ rt 2 

_ ^ Sr^2 ry2 _ o \^ ^»^' , V^ ^J 
-^Z^^i ~ ^i -~^Z^^2- + Z^T2 
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A 2/3/ re'^^'-'Uw{t)dW{s)+2a [ el^'dW{s) 
Jo Jo Jo 

k=o -^0 ^-^0 ^ 

+ / (/3 r e^^'-'Uw{t) + ae^'-J2^k f e^^'-'^ fk{t) dt] ds 
Jo V -^0 ^^Q Jo ) 

:=U{ff,/3,a), 

where "— t-" indicates weak convergence on C(M^^"'^ x (—00, 0] x M). Through- 
out this paper, when referring to convergence of stochastic processes on C(M ) , 

the notation "— t-" ("— 5-") means convergence in distribution (probabihty) 
on C(]K) where IK is any compact set in R'^. 

As a special case of a polynomial, set fk(t) = t^. In this case, the limiting 
process U{f],/3,a) is 

U{f],(3,a) = 2(3 [ [ e^^'-^Uw{t)dW{s) 
Jo Jo 

+2a [ e^'dW{s)-2y2vk I ( T e^^'-^h^dt) dW{s) 
Jo f^^Q Jo \Jo J 

Jo \ Jo fc^o ^0 

From now on we consider the simple case of just a nonzero mean, that is, 
p = and /o (t) = 1 . The formula further simplifies to 

U{7]o,P,a) = 2f3 f f e^^'-^^dW{t)dW{s) 
(2.5) ^° ^° 



fc=0 



+2a [ e^'dW{s)-2i]o I ^ ^ ' dW{s) 
Jo Jo P 

+ /(/?/ e^^'-'Uw{t) + ae^''-vo^^] ds. 



As shown in [13], one can recover the exact likelihood by integrating out 
the initial parameter effects. More specifically, 

n 
t=0 

^ ^ expJ ^*=o^* 



V2^J I 2a2 
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exp 






integrating out the augmented variable Zinit yields 



/ + 00 
/(x„,Zinit)(i2imt 
-oo 



(2.6) =(^E= exp J ^*=o^* 






A similar argument as in [13] then shows that by profiling out the variance 
parameter a^ the exact profile log-likelihood Ln{r]Q,/3) has the following 
property: 

Lnim,P) -^n(%,0) 

AL*(r?o,/3) 

(2.7) ^iog/;;exp{-^^^^}.« 

-log/;;exp{-^^(!^}.«. 

The weak convergence results on C(M?) in (2.7) can be used to show conver- 
gence in distribution of a sequence of local maximizers of the objective func- 
tions Ln to the maximizer of the limit process L provided the latter is unique 
almost surely. This is the content of Remark 1 (see also Lemma 2.2) of Davis, 
Knight and Liu [12] , which for ease of reference, we state a version here. 

Remark 2.1. Suppose {L„(-)} is a sequence of stochastic processes 
which converge in distribution to L(-) on C(IR ). If L has a unique max- 
imizer /3 a.s., then there exists a sequence of local maximizers {/3„} of {Ln} 
that converge in distribution to /3. Note that this is consistent with many 
of the statements made in the classical theory for maximum likelihood (see, 
e.g.. Theorem 7.1.1 of Lehmann [15]) and for inference in nonstandard time 
series models; see Theorems 8.2.1 and 8.6.1 in Rosenblatt [16], Breidt et 
al. [5], Andrews et al. [3] and Andrews et al. [2]. In some cases, for exam- 
ple, if the {Ln} have concave sample paths, this can be strengthened to 
convergence of the global maximizers of L.„. See also Davis, Chen and Dun- 
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smuir [9], Davis and Dunsniuir [11], Breidt et a.l [5] for examples of other 
cases when {Ln} are not concave. 

Returning to our example, under the case when 9q = 1, that is, /? = 0, 
the limit of the exact likelihood is L{riQ,(3 = 0). This corresponds to the 
situation of inference about the mean term when it is known that the driving 
noise is an MA(1) process with a unit root. Since the Gaussian likelihood 
is a quadratic function of regression coefficients, L{r]Q,/3 = 0) is a quadratic 
function in rjQ. Applying Remark 2.1, we obtain that the MLE r/o converges 
in distribution to fJQ, the global maximizer of L(t]q,I3 = 0). In particular, fJQ 
is the value that makes ^L(r7o, /3 = 0) = 0. Since 



dm' 






where 



and 



_ f^^exp{-U{7]o,l3 = 0,a)/2}{-{l/2){dU{rio,^ = 0,a)/dm))da 
J^^exp{-U{r,o,^ = 0,a)/2}da 

U{7]o,l3 = 0,a)=2aW{l)-2'r]o / sdW{s)+ / [a - srjof ds 

Jo Jo 

U(7]o,l3 = 0,a) = 27]o s^ds-2 sdW(s)-2a sds. 
dVo Jo Jo Jo 

Solving ■^L{r]Q, /3 = 0) = 0, we find that 

(2.8) ^0 = 12 / s dW{s) - 6W{1) ~ N(0, 12) 

Jo 

and hence 

(2.9) n3/2(6o „ - bo) = aom,n A N(0, 12ag). 

This counter-intuitive result was also obtained earlier by Chen et al. [8]. It 
says the MLE of the mean term in the process would behave like a normal 
distribution asymptotically, but with convergence rate n^'^. Notice that, 
even if one does not know the true value of 9, the MLE of the mean term 
would still behave very much like (2.9) due to the large pile-up effect in this 
case. However, the MLE is not asymptotically normal, if both bo and 6 are 
estimated. 

3. MA(2) with unit roots. The above approach, which also works in the 
invertible case, does not rely on detailed knowledge of the form of the eigen- 
vectors and eigenvalues of the covariance matrix. Hence it has the potential 
to work in higher order models where the eigenvector and eigenvalue struc- 
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Fig. 1. V region defined by —c\ — C2 <l,ci — C2 < 1, |c2| < 1. 



ture is not known explicitly. We will concentrate on the MA(2) process in 
this section and further illustrate our methods. 

In the following section, we consider the model given in (1.4), where pa- 
rameters ci,C2 € V; the triangular shaped region depicted in Figure 1. The 
interior of this region corresponds to the invertibility region of the param- 
eter space. Note that the triangular region is separated into complex roots 
and real roots of the MA polynomial 1 + ciz + C2Z^ by a quadratic curve 
cl - 4c2 = 0. 

If the parameters are on the boundary of the y region, it indicates pres- 
ence of unit roots. Otherwise, the model is said to be invertible; see also 
Brockwell and Davis [6]. Model (1.4) can also be represented in terms of the 
roots of the MA polynomial by 

Xt = {l + ciB + C2B^)Zt 



{l-9oB){l-aoB)Zt, 



where ci = —Oq — ao and C2 



3.1. Case 1: |ao| < 1 and 9{) = 1. This case corresponds to the situation 
of only one unit root in the MA polynomial, that is, the boundary AB in 
Figure 1. Let Ln{9,a) be the profile likelihood of an MA(2) process. Again, 
we adopt the parametrization 



1 + 



/3 



n 



and 



a ■ 



ao + 



7 



n 



(3<0, 



7e: 
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For convenience, define the intermediate process Yt = {1 — QoB)Zt and ob- 
serve that 

Xt = {l- 9oB){l - aoB)Zt = (1 - eoB)Yt. 

In the MA(2) case, two augmented initial variables Zinit and Yinit are needed. 
These initial variables and the joint likelihood have a simple form, that is, 

(3.1) Zinit = ^-i and Yinit = ^o - "o-Z'init, 

/x,Yinit,Zinit (Xn, yinit, ^init) = fY,Yinit,Zinit (yn, yinit, ^init) 

= /z,Zinit(Zn,^init) 
n 

= n f^('^y 

As what has been shown in the MA(1) case, the key of our method is to 
calculate the formula for the residual rj := Zi — Zi, which can be obtained 
from 

Zi = yi + ayi-i H h a'~^yi + a*yinit + a^'^'^^imt 

+ a'-^(Xi + 6lyinit) + a* yinit + a*+^^init 

= > ^ a ^i "^ a ^init + « ^init 

^^ 6 — a 6 — a 

(3.2) =Z,- ^^° "g 11^^" "°^ i: 9^~^~^Z, 

(ao-«)(go-a) ^ ._. , 9^+'-a^+^ 

+ a*+'(^init - ^-i) + (^0 - 9)—- Z_i 

(7 — Ct 

(3.3) =Zi-ri, 

where the fourth equation (3.2) comes from the fact that Xj = Zj — {9q + 
oq) X Zj-i + 9oaoZj-2 and Yo = Zq — aoZ^i. Therefore, the residuals r^ are 
given by 

i9o-9)i9-ao) Vp^,^i^ 
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(ao-a)(^o-a) v^ i-f-iv 0'+^ - a' 



(3-4) + ^^ 2^ « ^ Z,- -^——(y,,,, - Yo) 

— a 

Notice that the residuals tj no longer have a neat form as in the MA(1) case. 
This is what makes the MA(2) case more interesting yet more complicated. 
In the following calculations, let 

yinit = >0 H !=■ and Zinlt = ^-1 + 



With a similar argument as in [13], we opt to minimize the objective function 
(3.5) Unif3,^,m,V2) = -2 E ^ + E ^• 



'o-^)(^-ao) v,.^,^i^ ^!!i^^r„ Y^ 

i 
(ao-a)(6'o-a) 



First note that ri = Ai + Bi + Ci + Di, where 

( Hr, — H U H — rvr, 1 

i-1 

^-= ^r^ Z." ^^■' 

— a 
To determine the weak limit of — 2^"^_j^^^^ in (3.5) in the continuous 
function space, note that 

_ 2 v^ ^i^i ^ o (^-^o)(^-Qo) v;^ v^ ^i-i-i:^:^ 

+ 27?i ^ ^i^_i Zi 27/1 -^ ^_i+i ^i 



y/Eie - a) ^^ ao V^{e - a) ^^ 



O-Q 



1 -ao + /3/"--7/\/^ •fr'i ~^iV "■/ ctq ^o-Q 

(3.6) ' ' 

+ ^^^ f fi+^y"^^ 

(1 -ao + /?/"■- 7/\/^) •f:^^V ^/ \/^cro 
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A2/3 /" l\f^^'~^)dW{t)dW{s)^^^^^^ [ el^'dWis), 
Jo Jo 1 - "0 Jo 

where the last term disappears in the hmit due to the fact that |ao| < 1- 
Similarly, we have 

^, ol e-a ^ ^ do fJo 



^ 7(l-ao-7/V^) 
1 - ao + /5/ra - 7/\/^ 



n i-\ „ „ 

(3.7) =27^: Y.<'~-^^<^) 

(3.8) A 27iV, 

where A^ ~ N(0, --^-^). The third equality holds because |ao| is strictly 

smaller than 1, and Op(l) is uniform in 7 on any compact set of M. The 
weak convergence from (3.7) to (3.8) follows from martingale central limit 
theorem; see Hall and Heyde [14]. It can also be shown that N and the H^(i) 
process from (3.6) are independent; see Theorem 2.2 in Chan and Wei [7]. 
Following similar arguments, it is easy to show that 

_2^^4o and -2f;^40. 

For the second term in (3.5), writing 
^rl ^ A^ + Bf + Cf + Df 

A 2AiBi + 2AiCi + 2AiDi + 2BiCi + IBjDi + 2Q A 

and using Corollary 2.10 in [13], we have 

(3.9) y q-A / (/3/ e^(^-*)dH^(t) + ^^e''M ds, 
it^i'^o Jo \ Jo l-ao J 
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13 



(3.10) E§^""' 



:i^o 



■7^var(A^). 
Moreover, it is relatively easy to show that 
(3.11) 



^%40 and 5^^A0. 



:i^o 



:i^o 



Next we show that all the cross product terms also vanish in the limit, 
namely. 



i=-\ 



0", 







Here we only give the details for showing XlIL-i 'a ' — ?• 0; the other cases 
can be proved in an analogous manner. Notice that for any fixed M > and 

any/3e[-M,0], 



E 



^=A ^0 



^^^^ _ /3/n(l - «o + /3/n)7/^(l - qq - 7/\/ra) 
(1 _ ao + /3/n - 7/V^)2 

(7/\/^)(l - tto - l/Vn){vi/Vn) 



+ 



{l-ao + P/n-j/^y^ 



(3.13) 



E 



i-1 



(0 



*+i - a*+ 



-^)E 



a 



«-j- 



i="i 



-1^ 



n / j— 1 



i-1 



^E E4-'-i^ E 1-^ 



+— E 



Vj=-l 



j-j-1 



pi^i ^ .-.-1^. w v^ A , /5^'"'"^ z, 



y/nao 



i=l 



1 + ^V^ V a'-^-^^ 



n i— 1 „ 

X — EE«""^-+-.(i) 



where Op(l) is uniform in /3 and 7 on any compact set in M x M. Setting 

-^i — ^]=-i^o~'' Zj/'^o^ it follows that Ri is a stationary AR(1) process 
satisfying 

Ri = aoRi-i + Zi/ao. 
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Since |ao| < 1; we can apply Theorem 3.7. in Tanaka [21] to obtain 

[nt] 

Sn{t):=^y^RAaS{t), 



j=0 



where a = X^^g '^o ~ i-a ^'^'^ '^(^) ^^ ^ standard Brownian motion. Also, 
since Ri is adapted to the cr-fields J^i generated by Zo,...,Zj. By Theo- 
rem 2.1 in [13], we obtain 

^V 1 + - Ri.iAa e^'dS{s) onC7[-M,0]. 



Therefore, 

n 

(3.14) m^ 



It is also easy to see that 

n i—1 

rni 

n 



1 + - > ar. ■' -^ 

nj .^ ° do 



AO onC[-M,0]. 



(3.15) 
Since 
(3.16) 



2i-7 ^i 7^1 






0. 



,t:^Ai^A^^''^ ^^0/ ^ 



is in the form of the double sum in Theorem 2.8 in [13], except that {Ri} 
is no longer a martingale difference sequence. However, we can still follow 
the proof of Theorem 2.8 in [13] and show that (3.16) has a nondegenerate 
weak limit in C[— M, 0]. It follows that 



Pi 
n 



n / i—1 



j-1 



zA( ^ r. py-^-^ z, 



^E E-r-S E -J: 



(3.17) 



n / i—1 






i-j-l 



Zj \ Ri-i p^ 



Vnao 



0. 



n 



Thus, combining (3.14), (3.15) and (3.17), we conclude that the terms in (3.13) 
go to in probability on C[— M, 0]. The convergence in probability of the 
other terms in (3.12) can also be proved in a similar way. To sum up, we 
have shown the key stochastic process convergence result, that is, 

Un{P,7,Vl,V2) 
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(3.18) 



2/3/ / e 
Jo Jo 

-1 



I3{s-t) 



dW(t)dW{s) + 2-fN + ^^^ I e^'dW{s) 

1 - "0 Jo 



+ 







/? 



J{s-t) 



dW{t) + 



m 







1 -ao 



(is + 7^var(iV). 



Using (3.18), one can easily derive the asymptotics for the exact profile 
log-likelihood denoted by -L„(/3,7). In particular, 



L,(/3,7)-L„(0,0) 



(3.19) 



+00 



exp 



f/(/3,7,m) 



dr/i 



^- r ^(0,0, ryi) ,^ 
exp<; ^^ \ drii 



(3.20) 



where r/* 



(3.21) 



-log 

:=L*(/3,7) 



— 7iV var(A^) + log / exp 

^ J -00 

- log / exp<^ S dr] , 

and U{/3,r]*) is given by 



C/(/3,r?*) 



dry* 



^1 

l-ao 



C/(/3,r/* 



/3 



=,/3(s-t) 



dTy(i) + r?*e^ 



+ 



=/3(s-<) , 



*^/3s 



/3 / e^^'-') dW{t) + 77*e 




dW{s) 



ds. 



which is the limiting process of the joint likelihood obtained in the unit root 
MA(1) case, see also Davis and Song [13]. We state the key result of this 
paper in the following theorem. 

Theorem 3.1. Consider the model given in (1-4) with two roots and a 
which are parameterized by 

(3 7 

9 = l-\ — and a = uq -\ — ^. 
n \ n 



Denote the "profile log-likelihood based on a Gaussian likelihood as L„(/3,7). 
Then L„(/3,7) satisfies 



L„(/3,7)-L„(0,0)4l*(/3,7) onC([-oo,0]x 
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where 

L*{l3,j) = -^N-\ var(iV) + u* (/3) 
(3.22) 

The processes U*{j3) and Zq{(3) are defined by 

(3.23) 

— log / exp< — > da 



and 



fc=i 



Furthermore, there exists a sequence of local maxima I3n,'jn o/L„(/?,7) con- 
verging in distribution to /Jmle^Tmle, ^/le global maximum of the limiting 
process U*{j3,^). If model (1-4) has, at most, one unit root, then for the 
estimators ci and C2, we have 

\-c\ Cl(l-C2) 

ci(l-C2) l-c| 



C\ — C\\ d 



(3.25) V^(-:-)An(0, 



Remark 3.2. The equivalence in distribution of the processes U*{l3) 
and 2^o(/3) is given in Theorem 4.3 in Davis and Song [13]. As mentioned in 
Davis and Dunsmiur [10], convergence on C(-oo,0] does not necessarily imply 
convergence of the corresponding global maximizers. Additional arguments 
were required to show that the maximum likelihood estimator converged in 
distribution to the global maximizer of the limit process. We suspect that 
the same holds here for /Smle,™ and 7MLE,n and simulation results, some of 
which are contained in Sections 4 and 5, bear this out. 

Remark 3.3. To establish the convergence in (3.25), if there is exactly 
one unit root, then 

^/. N /3mle . d - N 

y/n[Ci - Cij = ^ - 7MLE -^ -7MLE '■ 



n var(A^) 

= N(0,l-a2) = N(0,l-c2), 

^/. X ^ , «o/?MLE , 7MLE/5mLE d , 
\/n[C2 - C2) = 7MLE H 1= \ > 7MLE 

Vn n 



--^iN(0,l-ag)=N(0,l-ci). 
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Here, we use the fact that /5mle < oo a.s. as stated in (Theorem 4.3 in [13]). 
One can also calculate the limiting asymptotic covariance of ci and C2 as 

-var(7MLE) = -(l-ao) = -(l + ao)(l - "o) 

= Ci(l-C2). 

Remark 3.4. The above theorem says that when |ao| < 1 and 9q = 1, 
we have a similar asymptotic result for ci and C2 as in the invertible case. 
If we only consider the original parameters ci and C2, the effect of the unit 
root disappears in the limit. But \/n{ci — ci) and \/n{c2 — C2) are perfectly 
dependent in the limit, since ci(l — C2) = 1 — c^. 

Remark 3.5. The estimated roots 6 and a calculated from ci and C2 
are asymptotically independent. Interestingly, /3mle corresponding to the 
unit root in MA(2) has exactly the same distribution as the Pule in the 
MA(1) case. So the pile-up and other properties of /3mle follow exactly from 
those in the MA(1) case. It may seem surprising that the unit root in the 
MA(2) model (when there is only one unit root) behaves asymptotically just 
like the unit root in MA(1) case. To see this, consider the situation where 
we are given the parameter a and a = oq. In this case, 7 = and 

L„(/3,0) - L„(0,0) A logy^"exp|-^^^^| d,?* 

- log / exp<^ } drj , 



which is the limiting process of the exact profile log-likelihood in the MA(1) 
case. On the other hand when a is given, 9 becomes the only parameter that 
needs to be estimated 

(3.26) Xt = il-aoB){l-eB)Zt. 

Because of the invertibility of the operator 1 — agB, we can get an interme- 
diate process Yj by inverting the operator. Namely, 

^ 00 

(3.27) Yt := j^^^^^Xt = J] agX^.^ = (1 - eB)Zt. 

Since we are dealing with asymptotics, inverting the operator 1 — oqB is 
feasible. Therefore, the transformed process Yt is indeed an MA(1) process 
with the true parameter 6q = 1. Then it follows naturally that the properties 
of the estimator of 6 in this situation should be equivalent to those of 9 in 
a unit root MA(1) process. 

3.2. MA (2) with two unit roots. In moving from the unit root problem 
for the MA(1) model to the MA(2) model, several new and challenging 
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problems arise. In this subsection, we discuss some issues when there are 
two unit roots in the MA polynomial. 

3.2.1. Case 2: C2 = 1 and c\ ^ ±2. This corresponds to the case that the 
true parameters are on the boundary C2 = 1, that is, the boundary AC in 
Figure 1, which means the two roots live on the unit circle and are not real 
valued. Denote the two generic complex valued roots of the MA polynomial 

by i;^ = re*^ and cj) = re~*^. To avoid confusion in notation, we use i to 
represent \/— T. A rather different representation of the residuals ri is used 
in this case, that is, 

— — — i—\ 

(3.28) - , J (zinit.o - Zq) 



1 



H ] 7 (^init,-l - Z. 



+ J^4> ^-" 

We also adopt the parameterization for r, 9 and two initial variables given by 

r = l + - and 6' = 6'o + -, 
n n 

2imt,o = ^0 H Y^ and Zjnit ,-i = ^-i + 



n \/n 

2 



Again, we study the limiting process of — 2^"^_-^ ^^^%^ + 'Y^=-\ ~^- Here we 
only present the first term of XlIL-i^^^ ^°^ illustration; the limit of the other 
terms can be derived in a similar fashion. By Theorem 2.8 in [13], we obtain 



^ Th t L 7 7 

i=0 j=-l 




dW(t)dW(s), 
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where W(t) is a two-dimensional Brownian motion, W(t) = Wi{t) + iW2{t), 
W(t) = Wi{t) — iW2it) and Wi{t) and W2{t) are the corresponding weak 
hmits of the sum 

[nt] [nt] 

Wi,nit) = Y,^os{keo)y^ and P^2,n(t) = J^sin(Mo)^^. 
k=o V^^o f^^Q \/nao 

The weak convergence of Wi^n{t) and W2,n(i) to two independent Brownian 
motions is guaranteed by Theorem 2.2 in Chan and Wei [7]. 
By Theorem 2.1 in [13] we have 






Therefore, (3.28) leads to 



ido 



(3.29) 



2 N^ -^ — ;• 4K< (7COS0O — 7sin0o + /Scos^o + i/3 sin 6*0)6 

Jo Jo 

[ 2isin0o -^o J 

t 2isin0o -'o J 

where 3f?{-} means the real part of a complex function. The weak limit of 
Yl^=-i '^f I'^o ^^'^ ^^so tie computed in an analogous manner using Corol- 
lary 2.10 in [13]. However, the weak limit of 'Yl^=-i '^ll^'o has an even more 
complicated form than (3.29). 

By integrating out the auxiliary variables, the exact likelihood can be 
recovered as well. However, the form of the joint likelihood function is 
much more complicated than the one computed in the one unit root case. 
The asymptotic properties and pile-up probabilities in this case remain un- 
known. 

3.2.2. Case 3: C2 = 1 and ci = —2. This corresponds to the vertex A in 
the v^^gion ii^ Figure 1. It is convenient to first consider a special case of 
local asymptotics when the approach to the corner is through the boundary 
— ci — C2 = 1. With this constraint, the dimension of the parameters has been 
reduced from two to one. We parameterize the MA(2) in this case by 

(3.30) Xt = Zt-{e + l)Zi_i + 9Z, 



t~2 
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and define a ^mit and a Y[nit as in (3.1), but with different normalization, 
that is, 

(3.31) 9 = 1 + ^, Y,^,, = Yo + ^ and Zi„it = ^-i + ^. 



Then, with the help of the theorems in Davis and Song [13], it follows that 

n „ n 2 



(3.32) 



A 2/3/ r el^^'-'Uw{t)dW{s) 
Jo Jo 



+ 2r?2 / e^^dVF(s)-^ / (l-e^^)dW^(s) 
Jo P Jo 

+ I (pT e^^'-'^ dW{t) + me"' - 2r/i ^—^) ds. 

There is a connection between this limiting process and the one in (2.5) 
derived for the limiting process for an MA(1) model with a nonzero mean. 
Notice that in (2.5), U{t]o, 13, a) is exactly the process we just derived with rji 
and 772 replaced by a and r/o- This leads us to an interesting connection of 
the mean term in the lower order MA model and the initial value in the 
higher order MA model, which we will discuss further in the Section 6. 

Alternatively, if we do not impose the constraint — ci — C2 = 1, there are 
two possible ways to parameterize the roots. First, the vertex can be ap- 
proached through the real region, where ci = —9 — a, 02 = 9a and the roots 
are parameterized further as 



9 = 1 -\ — and a = 1 -\ — , 
n n 



which makes 



ci = -l-H and C2 = H ho - 

\ n J n \n 

The second parameterization is through the complex region, in which the 
roots are re and re~ with ci = — 2rcos(0), C2 = r"^ ■ The radius and the 
angular parts are further parameterized as 

13 7 

r = 1 H — and 9 = —, 
n n 

which implies 

ci = -l- ( 1 + — ) +o( - ) and C2 = l + — + o(-). 
V n J \n n \n 
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Therefore, in either case, if we ignore the higher order terms, ci and C2 can 
be approximated as 

ci = — 1— (iH — ) and C2 = 1 H — . 
\ nj n 

This parameterization, however, is exactly the one we have seen in the con- 
ditional case, which suggests that one of the unit roots has pile-up with 
probability one asyinptotically while the other unit root behaves like the 
unit root in the conditional case; see (3.30) and (3.32). This claim is also 
supported by the simulation results; see Table 4 in Section 5. 

4. Testing for a unit root in an MA(2) model. A direct application of 
the results in the previous section is testing for the presence of a unit root 
in the MA(2) model. For the testing problem, we extend the idea of a gen- 
eralized likelihood ratio test proposed in Davis, Chen and Dunsmuir [9] to 
the MA(2) case. Tests based on /3mle are also considered in this section. We 
will compare these tests with the score- type test of Tanaka [20] . 

To specify our hypothesis testing problem in the MA(2) case, the null 
hypothesis is Hq: there is exactly one unit root in the MA polynomial, and 
the alternative is Ha'- there are no unit roots. The asymptotic theory of the 
previous section allows us to approximate the nominal power against local 
alternatives. To set up the problem, for the model 

Xt = Zt- (a + l + ^Zt-i + Jl + ^Zt-2 

with \a\ < 1. We want to test Hq: 13 = versus Ha : /3 < 0. 

To describe the test based on the generalized likelihood ratio, let GLR„ = 
2(L„(/3mle,7mle) - -^n(0,7MLE,o)), where 7mle,o is the MLE of 7 when 
/3 = 0. An application of Theorem 3.1 gives GLR„ — )• L*(/3mle5 7mle) — 
L*{0,jmle) = U*{^mle), where L*(/3,7) and [/*(/?) are given in (3.22) 
and (3.23) and 7mle = —N/var(N). Notice that the limit distribution of 
GLR„ only depends on /^mlEj and 7 serves as a nuisance parameter, which 
does not play a role in the limit. Define the (1 — a)th asymptotic quan- 
tile 6GLR(a) and 6MLE(a) as 

PiU*0MLE) > bcLRia)) = a and P(^mle > &MLE(a)) = a. 

Since the limiting random variables U*{(3mle) and /3mle are the same as 
in the MA(1) unit root case, the critical values of 6GLR(a) and 6MLE(a) are 
the same as those provided in Table 3.2 of Davis, Chen and Dunsmuir [9]. 

There has been limited research on the testing for a unit root in the 
MA(2) case. One approach, proposed by Tanaka, was based on a score type 
of statistic, which is locally best invariant and unbiased (LBIU). However, 
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Local Alternatives 



Local Alternatives 



Fig. 2. Power curve with respect to local alternatives when a = — 0.3 (upper) and when 
a — —0.5 (lower). Sample size n = 50. The size of the test is set to be 0.05. 



implementation of this test requires choosing a sequence ?„ — ^ oo at a suitable 
rate. One choice is In = o{n^'^), yet this may not always work well, especially 
if a > 0; see also [20]. Next we compare the power curves of the three tests 
for sample size n = 50. 

Figure 2 below shows the power curves based on MLE, GLR and LBIU 
tests, when the invertible root a in the MA(2) model is —0.3 and —0.5, 
respectively. Since the score-type test of Tanaka is demonstrated to be lo- 
cally best invariant unbiased, it has a very small edge on the GLR test up 
to the local alternative 4 or so. Thereafter, the GLR test increasingly out- 
performs the LBIU test by a wide margin. When the sample size is 50, the 
local alternative parameter corresponds to ^ = 1 — 4/50 = 0.92. Also, as seen 
in Figure 2, the power function based on the MLE dominates the power 
function of the LBIU test for local alternatives greater than 8 or 9. 

In the case when a > especially for small sample sizes like 50, the be- 
havior of the tests based on MLE and LBIU are very poor. This is because 
when a> and there is one unit root, the two parameters ci and C2 lie on 
the boundary — ci — C2 = 1 which is close to the complex region boundary 
cf — 4c2 = 0. But our asymptotic results are derived in a way which assumes 
that the two roots are only approaching the limit through the real region. 
This holds asymptotically, but in finite sample cases, when we maximize the 
likelihood jointly over ci and C2, it is likely that the two maximizers would 
fall into the complex region. As a gets closer to —1 this effect becomes more 
severe. Thus we do not recommend using the test based on the MLE when 
the invertible root is likely to be negative. Using the test based on MLE 
usually gives larger size of the test. The LBIU is not good in this case either 
as pointed out in Tanaka [20]. The upper tail probabilities are greatly un- 
derestimated when a gets closer to —1, and hence Hq tends to be accepted 
much more often. Simulation results show that when the sample size is 50, 
and the true a is 0.3 and 0.5, the corresponding size of the LBIU test is 
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- - - MLE for MA(2) 

LBIUforMA(2) 

GLR)orMA(1 

- - MLE for MA(1 
■■■■ LBIUforMA(1 




6 8 10 

Local Alternatives 



Fig. 3. Power curve with respect to local alternatives when a = 0. Sample size n = 50. 
The size of the test is set to he 0. 05. 



0.0119 and 0.0015 which are much smaller than the nominal size 0.05. GLR 
seems to be the best among the three choices. This is due to the fact that 
the GLR only considers the maximum value of the likelihood ratio instead 
of the MLE of ci and C2. Therefore, even if ci and C2 are in the complex 
region, the GLR test can still be carried out whereas the test based on /3mle 
is not even well defined in this case. Although the size of the GLR test is 
often slightly greater than the nominal size, GLR gives the best performance 
under this situation. 

Finally, we compare these tests when a = 0; that is, the model is in fact 
a unit root MA(1). The test developed for the MA(2) case is still applicable. 
The results are summarized in Figure 3. Clearly, the power functions of 
the tests designed for the MA(1) dominate the power functions of their 
counterparts designed for the MA(2). However, it is surprising that for large 
local alternatives (greater than 9 or so), the GLR for the MA(2) model 
outperforms the LBIU for the MA(1) model. 

5. Numerical simulations. In this section, we present simulation results 
that illustrate the theory from Section 3. Realizations were simulated from 
the MA(2) process given by 



(5.1) 



Xt 



[l + a)Zt^i + aZt^2, 



where a takes the values 0.3, and —0.3, respectively. The MA(2) model 
was replicated 10,000 times for each choice of a, and then the MLEs for 
the MA(2) coefficients 6i and 02 were calculated for each replicate. The 
empirical pile-up probability, the empirical variance and MSE of the MLEs 
are reported in Tables 1 to 3. Notice that the numbers in the tables for the 
variance and the MSE are reported for the normalized estimates y/n{ci — Ci), 
i = l,2. 
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Table 1 
Summary of the case: a - 



0.3 



Sample 


Pile-up 


Variance 


MSE 


Variance 


MSE 


Correlation 


size 


probability 


of Ci 


of Ci 


of C2 


of C2 


of ci cind C2 


25 


0.5436 


2.1701 


2.1970 


2.4455 


2.6536 


0.9347 


50 


0.6041 


1.4063 


1.4118 


1.4967 


1.5553 


0.9644 


100 


0.6234 


1.1108 


1.1108 


1.1490 


1.1636 


0.9815 


400 


0.6398 


0.9788 


0.9788 


0.9854 


0.9890 


0.9953 


1,000 


0.6437 


0.9290 


0.9290 


0.9327 


0.9338 


0.9981 



Table 2 
Sum,mary of the case; a = [MA (1) with a unit root] 



Sample 


Pile-up 


Variance 


MSE 


Variance 


MSE 


Correlation 


size 


probability 


of Ci 


of Ci 


of C2 


of C2 


of ci and C2 


25 


0.5870 


2.1624 


2.1629 


2.5037 


2.6355 


0.8792 


50 


0.6182 


1.3661 


1.3670 


1.4690 


1.5053 


0.9378 


100 


0.6220 


1.1661 


1.1670 


1.2082 


1.2224 


0.9662 


400 


0.6318 


1.0440 


1.0441 


1.0544 


1.0578 


0.9918 


1,000 


0.6334 


1.0329 


1.0330 


1.0351 


1.0384 


0.9966 








Table 3 












Sum,m,ary 


of the case: 


Q = -0.3 






Sample 


Pile-up 


Variance 


MSE 


Variance 


MSE 


Correlation 


size 


probability 


of Ci 


of Ci 


of C2 


of C2 


of ci cind C2 


25 


0.6171 


1.8370 


1.8806 


2.1654 


2.2287 


0.7950 


50 


0.6347 


1.2820 


1.3053 


1.3647 


1.3820 


0.8938 


100 


0.6447 


1.0748 


1.0853 


1.1215 


1.1299 


0.9397 


400 


0.6472 


0.9245 


0.9267 


0.9316 


0.9339 


0.9822 


1,000 


0.6511 


0.9232 


0.9242 


0.9256 


0.9263 


0.9933 



As seen in the tables, the correlation of ci and C2 is increasing to 1 with 
the sample size. The variances and the MSEs are converging to the theo- 
retical value 1 — C2- As pointed out in [10] and [9], the asymptotic results 
work remarkably well even for small sample sizes in the MA(1) case. Here, 
although the pile-up probability is still 0.6518, the rates vary depending 
on a. For a > 0, rates are slow while for a < rates are much faster. Prom 
the derivation of the asymptotic results, there are error terms in the likeli- 
hood that vanish asymptotically and contribute to a more lethargic rate of 
convergence. Again the asymptotic results were derived assuming the roots 
are always in the real region, which only holds asymptotically. When the 



UNIT ROOTS IN MOVING AVERAGES 25 







Table 4 




Pile-up 


probab'i 


\lities 


for the 


'. case: ci — —2 


Sample size 








Pile-up probability 


100 








0.246 


500 








0.804 


1,000 








0.961 


5,000 








0.999 



sample size is small and a > 0, the MLEs of ci and C2 are more likely to 
be in the complex region than those when q < 0. Thus the limiting process 
would approximate the likelihood function poorly when a > 0, which in turn 
results in less pile-up in smaller sample sizes. 

Table 4 summarizes the pile-up effects for the model considered in Sec- 
tion 3.2.2, where the two roots of the MA polynomial are both 1. In one 
realization, the estimators are said to exhibit a pile-up if the MLEs of ci 
and C2 are on the boundary — ci — C2 = 1. 

As seen in the table, the pile-up probability is increasing to 1 with sample 
size. However, the claimed 100% probability of pile-up is not a good approx- 
imation for small sample sizes. Even when n = 500, the pile-up is only about 



6. Unit roots and difTerencing. As pointed out in Section 3.2.2, there is 
a link between the mean term in the lower order MA model and the initial 
value in the higher order MA model. To illustrate this, consider the simple 
case when 

where {Zt} ~ i.i.d. (0, Uq). So Yj is an i.i.d. sequence with a common mean. 
It is clear that 

\/^(A-/io)^N(0,cjg), 

where /i is the MLE of /i obtained by maximizing the objective Gaussian 
likelihood function. Now suppose we difference the time series to obtain 



Xt = (l- B)Yt = Zt-Z, 



t-i, 



which becomes an MA(1) process with a unit root. The initial value as 
defined before of this differenced process is 

■^init = Zq =Yo — fiQ. 

From the results in Theorem 4.2 in [13], if it is known that an MA(1) time 
series has a unit root, that is, /3 = 0, we have 

U{p = 0,a) = 2aW{l) + a'^ . 
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Clearly, d = — VF(1) and with our parameterization of Zinitj we have 

^/n{zir,it - Zq) ^^{Yo - fi - Yq + Ho) 
a = = 

which is consistent with the classical result. Therefore we can conclude that 
whenever we have an MA model with a unit root, the information stored 
in the initial value comes from the information of the mean term from the 
undifferenced series. So differencing the series will not get rid of the mean 
parameter; instead, differencing creates a new parameter .^init which behaves 
like the mean in the undifferenced series and its effect persists even asymp- 
totically. With this, we can now explain easily the result in (2.9). Turning to 
a little more complicated model consisting of i.i.d. noise and a linear trend, 
that is, 

(6.1) Yt = fio + bot + Zt, 

which, after differencing, delivers an MA(1) model with a unit root and 
a nonzero mean given by 

Xt = {l-B)Yt = bo + Zt-Zt^i. 

From (2.9), we know n^/^(& — bo) — )■ N(0, 12(To). But this can be obtained 
much more easily by analyzing the model (6.1). This is just a simple ap- 
plication of linear regression, and we can get exactly the same asymptotic 
result for b. 

Now consider the model from Section 2, 

Yt = bo + Zt-6Zt^,, 

where 9 = 1 + ^ is near or on the unit circle. By differencing we obtain 

Xt = {i- B)Yt = Zt-{i + e)Zt^i + ezt^2. 

If we define Z\ait as before and 

>1nit = Yo = bo + ZQ- Z_i, 

then yinit — Yinit can be viewed as 6 — 6o- Since 5 converges at the rate of n^' ^ , 
so does yinit- This explains the parametrization given in (3.31) as well as the 
resemblance of (2.5) and (3.32). 

7. Going beyond second order. The techniques proposed in this paper 
can be adapted to handle the unit root problem for MA(g) with g > 3. 
However, the complexity of the argument, mostly in terms of bookkeeping, 
also increases with the order q. In this section, we outline the procedure 
for the MA(3) case, from which extensions to larger orders are straightfor- 
ward. 
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Suppose {Xf} follows an MA(3) model, which is parameterized in terms 
of the reciprocals of the zeros of the MA polynomial, that is, 

Xt = Zt- {00 + (1)0 + i^o)Zt-i 

+ (^o<?^o + do'tpo + 4>o'ipo)Zt-2 - 6o4>o1poZt-3 

(7.1) ={l-eoB){l-cl)oB){l-i,oB)Zt 

= {l-eoB){l-cl)oB)Yt 

= {l-eoB)Wt. 

For simplicity, assume Oq ^ (po ^ ipQ. Now we form two intermediate pro- 
cesses Yt and Wt and consider three augmented initial variables defined 
by Zinit = Z^2, ^init = ^-1 + ipoZinit and Wmit = Yo + (/'oilnit- Similar argu- 
ments as in Section 3 show that the joint likelihood of (X, l^init,yinit) ^init) 
has a simple form given by 

n 
f:S.,Winit,Yinit,Ziniti^n,Winit,yimt,Zinit)= [[ fz{zj). 

i=-2 

As in the MA(1) and MA(2) cases, maximizing this joint likelihood is essen- 
tially equivalent to minimizing the objective function 



1 " 



Un = ^^{zt-Zf). 



The key to this analysis is to write out the explicit expression for Zi which 
is basically an estimator for Zi. The following equations are straightforward 
to derive: 

k 

(7.2) Wk = Y,^''~'Xi + e''w,r.^, 

1=1 

3 



(7.3) yj = ^(jf' kwj^ + (fj;>w.^.^ + , 



"'^^y-imU 



k=l 



(7.4) Zi = ^ V* hi + i^'Winit + ^'(0 + ^)ymit + ^'+^^mit. 

Plugging (7.2) into (7.3), we obtain 

(7.5) Vj = 2^ -— Xk H T— - — Winit + r y-mit, 
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and plugging this into (7.4), we obtain 

i 

x((^-V.)(V-<A)(c/.-^))^'X, 
+ 77^ IT77 7T - 77; 7777 77 + V' ^«init 



H 7 ^ 2/init + W ^init- 

While this is a more complicated looking expression than the one encoun- 
tered in the MA(2) case, the coefficient of Xj in the sum looks very similar 
to (3.2), only with more terms. Now replacing Xj with (7.1), Zi can be 
written as 

Zi = Zi— 2_^ ^tj^j 

i=-2 

(7.6) - Cr(^init - VFinit) - Cf (yinit - llnit) " Cf (zjnit " ^init) 

where C? • is the coefficient for Zj in Zj and is a combination of 0*"-', (jf~^ 
and V'*"-' , and Cf , Cf and Cf are coefficients for Wunt — M^init , yinit — ^nit and 
Zinit — ^init- They are linear combinations of 0*, </>* and -f/^*. For illustration, 
assume the MA(3) model has only one unit root with |^o| < 1) |<^o| < 1 and 
^0 = 1- We can then reparameterize the parameters as 

B a T 

61 = 1 + -, /3<0, (j) = (po + ^ and ip = tpo + -^, 

n y/n yn 

and the initial values as 

t^init = Winit H ^, yinit = iinitH 7^ and ^init = ^init + 



Then the objective function C/„ becomes 

C/n(/3,a,7,r?u;,??y,??z) 



(7.7) 



n „ n 2 

i=-2 "^0 i=_2'^0 
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-2E 




Z, , CfT?^ , C\^y , Cfr/, 






do V^ V^ V^ / 

Because of the special structure of C?-, Cf^ C\ and Q^, the sum in (7.7) 
consists of terms that have a similar structure to quantities like 



E 1 + - -- ^^^d V 1 + 



that were used in the MA(1) and MA(2) cases. By using a martingale cen- 
tral limit theorem and theorems proved in Davis and Song [13], one can 
establish the weak convergence of Un{/3,a,'y,r]w,r]y,r]z) to a random ele- 
ment U{f3,a,^,r]^,r]y,r]z) in C(]R^). Now arguing as in Section 3, the initial 
variables can be integrated out, and the limiting process of the exact profile 
log-likelihood can be established. 

For general q> 3, the residual ri = Zi — Zi has the form 



E 



1 q 

^tj^i + E CH^^i^k - INITfc), 
g+l fc=l 



where {INITi, . . . ,INITg} are q augmented initial variables, defined either 
through the i.i.d. random variables Zt or through the intermediate processes 
like Yt in the above example. Furthermore, C ? ■ is only a linear combina- 
tion of {0\~'' , . . . , 0\~-'), where (0i, . . . , 9q) are reciprocals of the roots of the 
MA(q) polynomial. Coefficients C^, k = 1, . . . ,q, are only linear combina- 
tions oi {9\, . . . , Oq). This special structure of rj allows us to apply the weak 
convergence theorems proved in Davis and Song [13] to find the limiting pro- 
cess of Un = -2 Yjl=-q+i "TiZilal -h ZliL-g+i ^^/^O' ^o™ which the limiting 
behavior of the maximum likelihood estimators of the 0j's can be derived. 
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